A French Fairy Tale Corpus syntactically and semantically annotated
نویسندگان
چکیده
Fairy tales, folktales and more generally children stories have lately attracted the Natural Language Processing (NLP) community. As such, very few corpora exist and linguistic resources are lacking. The work presented in this paper aims at filling this gap by presenting a syntactically and semantically annotated corpus. It focuses on the linguistic analysis of a Fairy Tales Corpus, and provides the description of the syntactic and semantic resources developed for Information Extraction. Resources include syntactic dependency relation annotation for 120 verbs; referential annotation, which is concerned with annotating each anaphoric occurrence and Proper Name with the most specific noun in the text; ontology matching for a substantial part of the nouns in the corpus; semantic role labelling for 41 verbs using the FrameNet database. The article also sums up previous analyses of this corpus and indicates possible uses of this corpus for the NLP community.
منابع مشابه
First International Workshop on Lexical Resources
ANR-funded Nomage project aims at describing the aspectual properties of deverbal nouns taken from a corpus, in an empirical way. It is centered on the development of two resources: a semantically and syntactically annotated corpus of deverbal nouns based on the French Treebank, and an electronic lexicon, providing descriptions of morphological, syntactic and semantic properties of the deverbal...
متن کاملFairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm
In this paper we present a fairy tale corpus that was semantically organized and tagged. The proposed method uses latent semantic mapping to represent the stories and a top-n item-to-item recommendation algorithm to define clusters of similar stories. Each story can be placed in more than one cluster and stories in the same cluster are related to the same concepts. The results were manually eva...
متن کاملSpanish FrameNet. An on-line lexical resource and its application to NLP
SFN is creating an online lexical resource for Spanish, based on frame semantics (Fillmore 1982, 1985) and supported by corpus evidence. The basic aim of the project is to develop a semantically and syntactically annotated lexical resource with broad lexical coverage in Spanish which can be used as a training corpus for applications aimed at automatic semantic role labeling (Erk and Padó 2006)....
متن کاملShort Paper: Virtual Storyteller in Immersive Virtual Environments Using Fairy Tales Annotated for Emotion States
This paper describes the implementation of an automatically generated virtual storyteller from fairy tale texts which were previously annotated for emotion. In order to gain insight into the effectiveness of our virtual storyteller we recorded face, body and voice of an amateur actor and created an actor animation video of one of the fairy tales. We also got the actor’s annotation of the fairy ...
متن کاملGrowing TreeLex
TreeLex is a subcategorization lexicon of French, automatically extracted from a syntactically annotated corpus. The lexicon comprises 2006 verbs (25076 occurrences). The goal of the project is to obtain a list of subcategorization frames of contemporary French verbs and to estimate the number of different verb frames available in French in general. A few more frames are discovered when the cor...
متن کامل